<!--begin.rcode results='hide', echo=FALSE, message=FALSE
library(caret)
data(BloodBrain)

hook_inline = knit_hooks$get('inline')
knit_hooks$set(inline = function(x) {
  if (is.character(x)) highr::hi_html(x) else hook_inline(x)
  })
opts_chunk$set(comment=NA)

session <- paste(format(Sys.time(), "%a %b %d %Y"),
                 "using caret version",
                 packageDescription("caret")$Version,
                 "and",
                 R.Version()$version.string)
    end.rcode-->

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--
    Design by Free CSS Templates
    http://www.freecsstemplates.org
    Released for free under a Creative Commons Attribution 2.5 License

    Name       : Emerald 
    Description: A two-column, fixed-width design with dark color scheme.
    Version    : 1.0
    Released   : 20120902

  -->
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="keywords" content="" />
    <meta name="description" content="" />
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Measuring Model Performance</title>
    <link href='http://fonts.googleapis.com/css?family=Abel' rel='stylesheet' type='text/css'>
    <link href="style.css" rel="stylesheet" type="text/css" media="screen" />
  </head>
  <body>
    <div id="wrapper">
      <div id="header-wrapper" class="container">
  <div id="header" class="container">
	  <div id="logo">
	    <h1><a href="#">Measuring Model Performance</a></h1>
	  </div>
          <!--
	      <div id="menu">
		<ul>
		  <li class="current_page_item"><a href="#">Homepage</a></li>
		  <li><a href="#">Blog</a></li>
		  <li><a href="#">Photos</a></li>
		  <li><a href="#">About</a></li>
		  <li><a href="#">Contact</a></li>
		</ul>
	      </div>
              -->
	</div>
	<div><img src="images/img03.png" width="1000" height="40" alt="" /></div>
      </div>
      <!-- end #header -->
      <div id="page">
	<div id="content">
	
<h1>Contents</h1>  
<ul>
  <li><a href="#test">Evaluating Test Sets</a></li> 
  <li><a href="#lift">Evaluating Class Probabilities</a></li>
 </ul>   
      
<div id="test"></div>   
<h1>Evaluating Test Sets</h1>
<p>
A function, <span class="mx funCall">postResample</span>, can be used obtain the same
performance measures as generated by <span class="mx funCall">train</span> for regression or
classification. 
</p>
<p>
<a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a> also contains several functions that can be used to
describe the performance of classification models. The functions
<span class="mx funCall">sensitivity</span>, <span class="mx funCall">specificity</span>, <span class="mx funCall">posPredValue</span> and
<span class="mx funCall">negPredValue</span> can be used to characterize performance where
there are two classes. By default, the first level of the outcome
factor is used to define the "positive" result (i.e. the event of
interest), although this can be changed.  
</p>
<p>
The function <span class="mx funCall">confusionMatrix</span> can also be used to summarize
the results of a classification model. This example uses objects from the webpage for <a href="training.html">Model Training and Tuning</a>: 
</p>
<!--begin.rcode other_conf00,eval=FALSE
testPred <- predict(gbmFit3, testing)
    end.rcode-->
<!--begin.rcode other_conf0
postResample(testPred, testing$Class)

sensitivity(testPred, testing$Class)

confusionMatrix(testPred, testing$Class)
    end.rcode-->

<p>
The "no--information rate" is the largest proportion of the observed
classes (there were more actives than inactives in this test set). A
hypothesis test is also computed to evaluate whether the overall
accuracy rate is greater than the rate of the largest class. Also, the
prevalence of the "positive event" is computed from the data (unless
passed in as an argument), the detection rate (the rate of true events
also predicted to be events) and the detection prevalence (the
prevalence of predicted events). 
</p>
<p>
Suppose a 2x2 table with notation	
</p>		

<p><img width = 234 height = 100 src="table.png"></p>

<p>The formulas used here are:</p>  
  
<p><img width = 669 height = 285 src="cm.png"></p>

<p>
When there are three or more classes, <span class="mx funCall">confusionMatrix</span> will
show the confusion matrix and a set of "one-versus-all"
results. For example, in a three class problem, the sensitivity of the
first class is calculated against all the samples in the second and
third classes (and so on). 
</p>
<p>
Also, a resampled estimate of the training set can also be obtained
using <span class="mx funCall">confusionMatrix.train</span>. For each resampling iteration,
a confusion matrix is created from the hold-out samples and these
values can be aggregated to diagnose issues with the model fit.
</p>
<p>
For example:
</p>
<!--begin.rcode other_conf3
confusionMatrix(gbmFit3)
    end.rcode--> 

<p>
These values are the percentages that hold-out samples landed in the
confusion matrix during resampling. There are several methods for
normalizing these values. See <code>?confusionMatrix.train</code> for details.
</p>  



<p>
For multi-class problems, there are additional functions that can be used to calculate performance. One, <span class="mx funCall"> mnLogLoss </span> computes the negative of the multinomial log-likelihood (smaller is better) based on the class probabilities. This can be used to optimize tuning parameters but can lead to results that are inconsistent with other measures (e.g. accuracy or the area under the ROC curve), especially when the other measures are near their best possible values. The function has similar arguments to the other functions described above:
</p>

<!--begin.rcode other_logLoss
test_results <- predict(gbmFit3, testing, type = "prob")
test_results$obs <- testing$Class
head(test_results)
mnLogLoss(test_results, lev = levels(test_results$obs))
end.rcode--> 

<p>
Additionally, the function <span class="mx funCall"> multiClassSummary </span> computes a number of relevant metrics:
</p>
<ul>
  <li>the overall accuracy and Kappa statistics using the predicted classes</li>
  <li>the negative multinomial log loss (if class probabilities are available)</li>
  <li>averages of the “one versus all” statistics such as sensitivity, specificity, the area under the ROC curve, etc. 
</li>
</ul>

<p>
For example:
</p>

<!--begin.rcode other_multiclass
test_results$pred <- predict(gbmFit3, testing)
multiClassSummary(test_results, lev = levels(test_results$obs))
end.rcode--> 




	
<div id="lift"></div>    
<h1>Evaluating Class Probabilities</h1>
<p>  
The package also contains two functions for class probability predictions for data sets with two classes.
</p>
<p>  
The <span class="mx funCall">lift</span> function can be used to evaluate probabilities thresholds that can capture a certain percentage of <i>hits</i>. The function requires a set of sample probability predictions (not from the training set) and the true class labels. For example, we can simulate two-class samples using the <span class="mx funCall">twoClassSim</span> function and fit a set of models to the training set:
</p>
<!--begin.rcode lift_fits, tidy=FALSE
set.seed(2)
trainingSim <- twoClassSim(1000)
evalSim     <- twoClassSim(1000)
testingSim  <- twoClassSim(1000)

ctrl <- trainControl(method = "cv",
                     classProbs = TRUE, 
                     summaryFunction = twoClassSummary)

set.seed(1045)
fdaModel <- train(Class ~ ., data = trainingSim, 
                  method = "fda",
                  metric = "ROC",
                  tuneLength = 20,
                  trControl = ctrl)
set.seed(1045)
ldaModel <- train(Class ~ ., data = trainingSim, 
                  method = "lda",
                  metric = "ROC",
                  trControl = ctrl)

set.seed(1045)
c5Model <- train(Class ~ ., data = trainingSim, 
                 method = "C5.0",
                 metric = "ROC",
                 tuneLength = 10,
                 trControl = ctrl)

## A summary of the resampling results:
getTrainPerf(fdaModel)
getTrainPerf(ldaModel)
getTrainPerf(c5Model)

    end.rcode--> 
<p>
From these models, we can predict the evaluation set and save the probabilities of being the first class:
</p>
<!--begin.rcode lift_preds, tidy=FALSE
evalResults <- data.frame(Class = evalSim$Class)
evalResults$FDA <- predict(fdaModel, evalSim, type = "prob")[,"Class1"]
evalResults$LDA <- predict(ldaModel, evalSim, type = "prob")[,"Class1"]
evalResults$C5.0 <- predict(c5Model, evalSim, type = "prob")[,"Class1"]
head(evalResults)
    end.rcode--> 
<p>
The <span class="mx funCall">lift</span> function does the calculations and the corresponding <span class="mx funCall">plot</span> function is used to plot the lift curve (although some call this the <i>gain</i> curve). The <span class="mx arg">value</span> argument creates reference lines.
</p>    
<!--begin.rcode lift_plot, tidy=FALSE,fig.width=8,fig.height=5
trellis.par.set(caretTheme())
liftData <- lift(Class ~ FDA + LDA + C5.0, data = evalResults)
plot(liftData, values = 60, auto.key = list(columns = 3,
                                            lines = TRUE,
                                            points = FALSE))
    end.rcode--> 
<p>
From this we can see that, to find 60 percent of the hits, a little more than 30 percent of the data can be sampled (when ordered by the probability predictions). The LDA model does somewhat worse than the other two models. 
</p>
<p>
The other function is for probability calibration. Other functions in the <a href = "http://finzi.psych.upenn.edu/R/library/gbm/html/calibrate.plot.html"><strong>gbm</strong></a> package, the <a href = "http://finzi.psych.upenn.edu/R/library/rms/html/calibrate.html"><strong>rms</strong></a> package and others. These plots can be used to assess whether the value of the probability prediction is consistent with the event rate in the data. The format for the function is very similar to the <span class="mx funCall">lift</span> function:
</p>
<!--begin.rcode cal_plot, tidy=FALSE,fig.width=8,fig.height=5
trellis.par.set(caretTheme())
calData <- calibration(Class ~ FDA + LDA + C5.0, 
                       data = evalResults, 
                       cuts = 13)
plot(calData, type = "l", auto.key = list(columns = 3,
                                          lines = TRUE,
                                          points = FALSE))
    end.rcode--> 
  
	  <div style="clear: both;">&nbsp;</div>
	</div>
	<!-- end #content -->
<div id="sidebar">
<ul>
  <li>
    <h2>General Topics</h2>
    <ul>
      <li><a href="index.html">Front Page</a></li>
      <li><a href="visualizations.html">Visualizations</a></li>
      <li><a href="preprocess.html">Pre-Processing</a><li>
      <li><a href="splitting.html">Data Splitting</a></li>
      <li><a href="varimp.html">Variable Importance</a></li>
      <li><a href="other.html">Model Performance</a></li>
      <li><a href="parallel.html">Parallel Processing</a></li>
    </ul>
    <h2>Model Training and Tuning</h2>
    <ul>
      <li><a href="training.html">Basic Syntax</a></li>
      <li><a href="modelList.html">Sortable Model List</a></li>
      <li><a href="bytag.html">Models By Tag</a></li>
      <li><a href="similarity.html">Models By Similarity</a></li>
      <li><a href="custom_models.html">Using Custom Models</a></li>
      <li><a href="sampling.html">Sampling for Class Imbalances</a></li> 
      <li><a href="random.html">Random Search</a></li> 
      <li><a href="adaptive.html">Adaptive Resampling</a></li> 
    </ul>
    <h2>Feature Selection</h2>
    <ul>
      <li><a href="featureselection.html">Overview</a>
      <li><a href="rfe.html">RFE</a></li>
      <li><a href="filters.html">Filters</a></li>
      <li><a href="GA.html">GA</a></li>
      <li><a href="SA.html">SA</a></li>
    </ul>  
  </li>
</ul>
</div>
<!-- end #sidebar -->
	<div style="clear: both;">&nbsp;</div>
      </div>
      <div class="container"><img src="images/img03.png" width="1000" height="40" alt="" /></div>
      <!-- end #page -->
    </div>
    <div id="footer-content"></div>
<!--begin.rcode echo = FALSE
knit_hooks$set(inline = hook_inline)    
    end.rcode--> 
 
    <div id="footer">
      <p>Created on <!--rinline I(session) -->.</p>
    </div>
    <!-- end #footer -->
  </body>
</html>
